System and Method for Scene-Space Video Processing

ABSTRACT

There is provided a video processing system for use with a video having frames including a first frame and neighboring frames of the first frame. The system includes a memory storing a video processing application, and a processor. The processor is configured to execute the video processing application to sample scene points corresponding to an output pixel of the first frame of the frames of the video, the scene points including alternate observations of a same scene point from the neighboring frames of the first frame of the video, and filter the scene points corresponding to the output pixel to determine a color of the output pixel by calculating a weighted combination of the scene points corresponding to the output pixel.

BACKGROUND

Many compelling video processing effects can be achieved if per pixeldepth information and three-dimensional (3D) camera calibrations areknown. Scene-space video processing, where pixels are processedaccording to their 3D positions, has many advantages over traditionalimage-space processing. For example, handling camera motion, occlusions,and temporal continuity entirely in two-dimensional (2D) image-space canin general be very challenging, while dealing with these issues inscene-space is simple. As scene-space information becomes more and morewidely available due to advances in tools and mass market hardwaredevices, techniques that leverage depth information will play animportant role in future video processing approaches. However, thesuccess of such methods is highly dependent on the accuracy of thescene-space information.

SUMMARY

The present disclosure is directed to systems and methods forscene-space video processing, substantially as shown in and/or describedin connection with at least one of the figures, as set forth morecompletely in the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a diagram of an exemplary video processing system,according to one implementation of the present disclosure;

FIG. 2 shows an exemplary diagram of a scene-space video processing,according to one implementation of the present disclosure;

FIG. 3 shows a diagram of an exemplary scene-space point cloud,according to one implementation of the present disclosure;

FIG. 4 shows examples of scene-space video processing effects, accordingto one implementation of the present disclosure;

FIG. 5 shows examples of scene-space video processing effects, accordingto one implementation of the present disclosure;

FIG. 6 shows examples of scene-space video processing effects andrelated information, according to one implementation of the presentdisclosure;

FIG. 7 shows an example of a scene-space video processing effect,according to one implementation of the present disclosure;

FIG. 8 shows an exemplary flowchart illustrating a method of scene-spacevideo processing, according to one implementation of the presentdisclosure;

FIG. 9 shows an exemplary flowchart illustrating a method of scene-spacesampling, according to one implementation of the present disclosure; and

FIG. 10 shows an exemplary flowchart illustrating a method ofscene-space filtering, according to one implementation of the presentdisclosure.

DETAILED DESCRIPTION

The following description contains specific information pertaining toimplementations in the present disclosure. The drawings in the presentapplication and their accompanying detailed description are directed tomerely exemplary implementations. Unless noted otherwise, like orcorresponding elements among the figures may be indicated by like orcorresponding reference numerals. Moreover, the drawings andillustrations in the present application are generally not to scale, andare not intended to correspond to actual relative dimensions.

FIG. 1 shows a diagram of an exemplary video processing system,according to one implementation of the present disclosure. As shown,video processing system 100 includes device 110 and display 195. Device110 includes processor 120 and memory 130. Processor 120 may accessmemory 130 to store received input or to execute commands, processes, orprograms stored in memory 130. Processor 120 may be a microprocessor ora similar hardware processor used in a computing device. Memory 130 is ahardware storage capable of storing commands, processes, and programsfor execution by processor 120. Memory 130 is a non-transitory storagedevice capable of storing data, commands, processes, and programs forexecution by processor 120. As shown in FIG. 1, memory 130 includesvideo 140 and video processing application 150.

Video 140 may be a video content including a plurality of frames. Eachframe of video 140 may include a plurality of scene points, where ascene point may be a portion of a frame that is visible in a pixel of aframe of video 140 when displayed on display 195.

Video processing application 150 includes sampling module 151 andfiltering module 153. For each pixel of an output frame of video 140,video processing application 150 may sample a plurality of scene points.In some implementations, a sample may include all scene points that liewithin a 3D frustum defined by an output pixel in the output frame.Video processing application 150 may then filter this sample set todetermine a color of the output pixel by weighting the samplesappropriately. Video processing application 150 may compute output colorO(p) for each pixel p in an output frame of video 140. For each O(p),video processing application 150 may sample a set of scene points S(p)directly from an input video I. A scene point sεR⁷ is composed of color(s_(rgb)εR³), scene-space position (s_(xyz)εR³), and frame time(s_(f)εR).

Video processing application 150 may also perform preprocessing of video140. In some implementations, video processing application 150 mayderive camera calibration parameters (extrinsics and intrinsics), C, anddepth information, D, from the input video I. Images may be processed inan approximately linear color space by gamma correction. Videoprocessing application 150 may compute camera calibration parametersautomatically using commonly available commercial tools. Videoprocessing application 150 may derive a depth map from input video I andcamera calibration parameters C using multi view stereo techniques, orinformation from a depth sensor, such as a Kinect sensor. Videoprocessing application 150 may use a simple, local depth estimationalgorithm where the standard multi-view stereo data-term may be computedover a temporal window around each frame. For each pixel, this entailssearching along a set of epipolar lines defined by C, and picking thedepth value with the lowest average cost using, for example, the sum ofsquared RGB color differences on 3×3 patches. This simple approach doesnot include any smoothness term, and therefore does not require anycomplex global optimization scheme, rendering it easy to implement andefficient to compute. The calculation may yield many local depthoutliers, introducing high-frequency “salt and-pepper” noise in thedepth map.

Sampling module 151 may sample a plurality of scene points correspondingto a frame or a plurality of frames of video 140, or an output pixel ofan output frame of video 140. In some implementations, sampling module151 may sample scene points corresponding to the output frame andneighboring frames of video 140. Neighboring frames may include a framethat is immediately before the output frame in video 140, a frame thatis immediately after the output frame in video 140, a plurality offrames sequentially preceding the output frame in video 140, a pluralityof frames sequentially following the output frame in video 140, or acombination of frames before and after the output frame. In someimplementations, sampling module 151 may determine a sample set of scenepoints corresponding to an output pixel of the output frame of video140. In some implementations, sampling module 151 may create a pointcloud by projecting a plurality of scene points visible to a pixel or aplurality of pixels in an input frame I using camera matrix C based onthe respective depth value D(p) of each of the scene points. In someimplementations, sampling module 151 may form the point cloud byprojecting scene points from a plurality of frames, including the outputframe and neighboring frames. By sampling the output frame andneighboring frames, sampling module 151 may include multipleobservations of the same scene point visible to the output pixel in thesample set.

Filtering module 153 may determine an output color for each output pixelin the output frame based on a plurality of sampled scene points.Filtering may be defined as a function Φ(S)εR⁷→R³ that takes a sampleset and determines an output color for each output pixel. Among thescene points in the sample set, some will correspond to a scene point,but others will come from erroneous observations. Erroneous observationsmay include observations of occlusion events, incorrect 3D information,or observations of moving objects. To calculate the color of the outputpixel, filtering module 153 may use a weighting function to emphasizescene point observations that are not erroneous observations, andde-emphasize the contribution of erroneous observations. In someimplementations, filtering module 153 may use a filtering function ofthe form:

$\begin{matrix}{{O(p)} = {{\Phi \left( {S(p)} \right)} = {\frac{1}{W}{\sum\limits_{s \in {S{(p)}}}\; {{w(s)}s_{rgb}}}}}} & (1)\end{matrix}$

where w(s) is a video processing effect specific weighting function and|W|=Σ_(sεS(p))w(s) is the sum of all weights.

In some implementations, filtering module 153 may calculate a weightedcombination of the plurality of scene points corresponding to an outputpixel of video 140 to determine a video processing effect. Filteringmodule 153 may determine a video processing effect by applying differentweighting functions w(s) to the 7D samples in the sample set. In someimplementations, a video processing effect may be determined by a videoprocessing effect specific weighting function. In particular, it isstraightforward to specify effects based on scene-space coordinates bymaking w(s) depend on the scene-space position of a sample.

Display 195 may be a display suitable for displaying videos, videoprocessing, and video processing effects. In some implementations,display 195 may be a television, a computer monitor, a display of asmart phone, a display of a tablet computer. Display 195 may include alight-emitting diode (LED) display, an organic LED (OLED) display, anliquid crystal display (LCD), a plasma display panel (PDP), or otherdisplay suitable for viewing and processing videos. In someimplementations, display 195 may be included in device 110.

FIG. 2 shows an exemplary diagram of scene-space video processing,according to one implementation of the present disclosure. At 201, aselected number of frames are projected into scene-space, includingoutput frame 251 and neighboring frames of video 140. Sampling module151 may project scene points in the selected number of frames to form apoint cloud, where each scene point has a corresponding cloud point 252.

At 202, sampling module 151 identifies all cloud points 252 that fall infrustum V of an output pixel and within the output pixel, and the cloudpoints 252 that are within the projection of frustum V, but fall outsideof the output pixel. At 203, sampling module 151 identifies frustum Vdefined by a pixel in output frame O. In order to find which cloudpoints 252 project into frustum V, video processing system 100 looks atthe projection of frustum V into a single frame J. All cloud points 252that project into V must reside inside the respective 2D convex hullV_(J) (determined by projecting the frustum V into J), as shown in FIG.2. Video processing system 100 operates on projected cloud points 258that lie inside the area of in V_(J) in image domain, and determines therequired samples that should be validated or checked.

For example, given output camera matrix C_(O), the 3D frustum volume Vof a pixel p is simply defined as a standard truncated pyramid using thepixel location (p_(x), p_(y)) and a frustum size l:

$\begin{matrix}\left. {V = \left\{ {C_{O}^{- 1} \cdot \left\lbrack {{p_{x} \pm \frac{l}{2}},{p_{y} \pm \frac{l}{2}},\left\{ {{near},{far}} \right\},1} \right\rbrack^{T}} \right)} \right\} & (2)\end{matrix}$

The 2D frustum hull V_(J) is obtained by individually projecting the 3Dvertices of frustum V into J, and connecting the projected vertices inJ. Because projected cloud pints 258 that fall inside of V_(J) maycorrespond to cloud points that lie in front of or behind frustum V,video processing system 100 cannot simply accept all projected cloudpoints that fall within V_(J).

At 204, video processing application 150 rasterizes all projected cloudpoints 258 that fall within V_(J), and sampling module 151 checkswhether their projection back into the output frame falls within V_(O).Sampling module 151 checks each pixel q in V_(J) to determine whether itmaps to a position in O that falls within V_(O). Specifically, videoprocessing system 100 checks the distance from the projected cloud pointmapped back into O to the original output pixel p.

$\begin{matrix}{{{{p - {C_{O} \cdot C_{J}^{- 1} \cdot \left\lbrack {q_{x},q_{y},q_{d},1} \right\rbrack^{T}}}}1} < \frac{l}{2}} & (3)\end{matrix}$

Scene points corresponding to cloud points that are within projectedfrustum V_(J) and that map to a position within the original outputpixel are added to the sample set. Arrow 255 indicates a projected cloudpoint that satisfies the conditions to be sampled, while the arrows 257indicate projected cloud points that were tested, but rejected. Aprojected cloud point that passes this test is converted into a 7Dsample and added to the sample set S(p).

At 205, filtering module 153 determines output pixel 296's color bycalculating a weighted combination of the plurality of scene pointscorresponding to the output pixel. In case of error-free depth maps,camera poses, and a static scene, the cloud points inside frustum V,where l=1, would be a complete set of all observations of the scenepoints corresponding to the cloud points, as well as any occluded scenepoints. However, inaccuracies in camera pose and depth may result inerroneous observations including false positives, i.e., outlier sampleswrongly gathered, and false negatives, i.e., scene point observationsthat are missed. In some implementations, to account for depth andcamera calibration inaccuracies, sampling module 151 may increase theper-pixel frustum size I to cover a wider range, such as l=3 pixels.

FIG. 3 shows a diagram of an exemplary scene-space point cloud,according to one implementation of the present disclosure. Depth map 302corresponds to input 301. Based on depth map 302, sampling module 151projects input 301 and a plurality of neighboring frames to create pointcloud 303, including a plurality of cloud points corresponding to scenepoints in each of the projected frames. Point cloud 303 shows a sideview of five images projected into scene-space.

FIG. 4 shows examples of scene-space video processing effects, accordingto one implementation of the present disclosure. Filtering module 153may use different calculations to filter the plurality of scene pointssampled by sampling module 151. In some implementations, filteringmodule 153 may be used to create different video processing effects.Diagram 400 shows examples of a denoising effect at 401, and adeblurring effect at 402.

As the same scene point is observed in a plurality of frames of video140, video processing system 100 can use these multiple observations todenoise frames of video 140. Averaging all samples in S(p) by settingthe weighting function w(s) equal to one may result in occluded scenepoints and noisy samples corrupting the result. Filtering is thenperformed as a weighted sum of samples, where weights are computed as amultivariate normal distribution with mean s_(ref).

$\begin{matrix}{{w_{d \in {noise}}(s)} = {\exp\left( {- \frac{\left( {s_{ref} - s} \right)^{2}}{2\; \sigma^{2}}} \right)}} & (4)\end{matrix}$

Input frame 401 a depicts an input frame of video 140 consisting of ablurry image. At 401 b, an example of the output frame after applyingscene-space deblurring shows the “Pay Here” sign is legible. Videoprocessing system 100 can deblur video frames that are blurry as aresult of sudden camera movements, such as shaking during hand-heldcapture, using the same equation used for denoising, modified by ameasure of frame blurriness:

$\begin{matrix}{{w_{d \in {blur}}(s)} = {{\exp\left( {- \frac{\left( {s_{ref} - s} \right)^{2}}{2\; \sigma^{2}}} \right)}{\sum\limits_{q \in I^{s_{f}}}\; {{\nabla{I^{s_{f}}(q)}}}}}} & (5)\end{matrix}$

where ∇ is the gradient operator, and I^(s)f is the frame from whichsample s originated. The first part is the same multivariate normaldistribution as Equation 4, and the second part is a measure of frameblurriness computed as the sum of gradient magnitudes in the image fromwhich s was sampled. This de-emphasizes the contribution from blurryframes when computing an output color. When implementing the videoprocessing effect of deblurring, filtering module 153 may use parameterssuch as σ_(rgb)=200, σ_(xyz)=10, σ_(f)=20.

While the above notation may be used for clarity, video processingapplication 150 represents samples in a 7D space and using a diagonalcovariance matrix, with the diagonal entries σ_(rgb) for the three colordimensions, σ_(xyz) for the scene-space position and σ_(f) for the frametime. For denoising, filtering module 153 may use parameters such asσ_(rgb)=40, σ_(xyz)=10, σ_(f)=6.

FIG. 5 shows examples of scene-space video processing effects, accordingto one implementation of the present disclosure. Filtering module 153may also perform a scene space form of super resolution to create ahigh-resolution output video O′ from a low-resolution input video I. Toapply a super resolution effect, filtering module 153 applies aweighting scheme that emphasizes observations of scene points with thehighest available resolution. Filtering module 153 determines that eachscene point is most clearly recorded when it is observed from as closeas possible (i.e., the sample with the smallest projected area inscene-space). To measure this, filtering module 153 applies a scenespace area property, s_(area). The scene-space area of a sample iscomputed by projecting its pixel corners into the scene and computingthe area of the resulting quad; assuming the output pixels are square,it is sufficient to compute the length of one edge. In someimplementations, filtering module 153 may let p_(l) and p_(r) be theleft and right edge pixel locations of a sample located at p and C bethe camera matrix for the sample's frame s_(f);

s _(area) =∥C ⁻¹ ·[p _(i) ,D(p),1]^(T) −C ⁻¹ ·[p _(r) ,D(p),1]^(T)∥₂²  (6)

Filtering module 153 applies the weighting function:

$\begin{matrix}{{w_{ar}(s)} = {{\exp\left( {- \frac{\left( {s_{ref} - s} \right)^{2}}{2\; \sigma^{2}}} \right)}{\exp\left( {- \frac{{s_{area}}^{2}}{2\; \sigma_{area}}} \right)}}} & (7)\end{matrix}$

The latter term de-emphasizes scene point observations that wereobserved from farther away, and emphasizes scene point observations withmore detailed information. In order to generate reference sampless_(ref) in this case, video processing system 100 bi-linearly upsamplesI to the output resolution. Because sampling module 151 allows samplesto be gathered from arbitrary pixel frustums, super resolution usessamples from frustums corresponding to pixel coordinates from O′, ratherthan O. For scene-space super resolution, filtering module 153 may useparameters such as σ_(rgb)=50, σ_(area)=0.02.

Diagram 500 shows an example of scene-space super resolution at 501. 501a shows an input frame, and 501 b shows the result of scene-space superresolution, showing significantly higher resolution, including legiblewords appearing on the globe in 501 b.

At 502, diagram 500 shows an example of the video processing effect ofobject semi-transparency. In some implementations, objectsemi-transparency may be used to “see-through” objects by displayingcontent that is observed behind the object in neighboring frames. 502 ashows an input frame of video 140. Object semi-transparency requires auser to specify which objects should be made transparent, either byproviding per frame image masks M, where M(p)=1 indicates that pixelshould be removed, or a scene-space bounding region. 502 b shows a 3Dmask of input frame 502 a, and 502 c shows the mask projected into inputframe 502 a. When scene-space bounding region is used, filtering module153 projects all samples that fall into the scene-space bounding regionback into the original images to create M. An example of scene-spaceobject semi-transparency is shown at 502 d.

When applying video processing effects including objectsemi-transparency and inpainting, filtering module 153 may not have areference s_(ref) in S(p) for the mask region. In such situations,filtering module 153 may instead compute an approximate reference sampleby taking the mean of all samples,

$\begin{matrix}{s_{ref} = {\frac{1}{{S(p)}}{\sum\limits_{s \in {S{(p)}}}\; s}}} & (8)\end{matrix}$

and weight samples with the following function,

$\begin{matrix}{{w_{inpoint}(s)} = \left\{ \begin{matrix}{\exp\left( {- \frac{\left( {s_{ref} - s} \right)^{2}}{2\; \sigma^{2}}} \right)} & {{{when}\mspace{14mu} {M\left( s_{p} \right)}} = 0} \\0 & {{{when}\mspace{14mu} {M\left( s_{p} \right)}} = 1}\end{matrix} \right.} & (9)\end{matrix}$

Applying this weighting function, filtering module 153 computes aweighted combination of samples based on their proximity to the meansample. If video processing application 150 iterated this procedure, itwould amount to a weighted mean-shift algorithm that converges oncluster centers in S(p). However, in practice, after two steps theresult visually converges. To achieve semi-transparent results,filtering module 153 may add the standard multivariate weighting to theinput frame I(p) and use σ_(rgb)=80, in order to emphasize similar colorsamples.

An example of scene-space inpainting is shown at 503. At 503 a, diagram500 shows an input frame of video 140, including an object to beremoved. 503 b shows the frame with masking, indicating the portion ofthe frame to be removed. 503 c shows the resulting output frame,including the preservation of objects previously occluded by the removedobject in input frame 503 a. For inpainting, filtering module 153 mayuse parameter values σ_(rgb)=55.

FIG. 6 shows examples of scene-space video processing effects andrelated information, according to one implementation of the presentdisclosure. In some implementations, filtering module 153 may apply avideo processing effect of a computational scene-space shutter. A“computational shutter” replaces the process of a camera integratingphotons that arrive at a pixel sensor with a controlled post-processingalgorithm. By extending this concept into scene space, video processingapplication 150 may generate compelling results that are fullyconsistent over camera motion. In this case, a shutter function,w_(shutter), replaces the weighting function, such as:

W _(compshutter)(s)=ξ(s _(f))  (10)

where ξ(s_(f)) is a box function in a typical camera. A straightforwardexample of scene-space long exposure shot is shown at 601. At 601 a, anexemplary input frame is shown. The effect of scene-space long exposureis shown at 601 b, where static elements of the frame remain clear, butthe water is blurred. For comparison, 601 c shows image space longexposure, where the whole frame is blurry as a result of cameramovement. As opposed to image-space long exposure shots, scene-spacelong exposure results in time-varying components becoming blurred butthe static parts of the scene remain sharp, despite the moving camera.

Diagram 600 shows action shots at 602 a-c, which are discussed inconjunction with graph s 603 a-c. Graphs 603 a-c show possiblealternatives for ξ(s_(f)). If filtering module 153 determines ξ(s_(f))to be an impulse train, as shown in 603 b, and applies it only in auser-defined scene-space region, video processing application 150 canobtain “action shot” style videos. By using a long-tail decayingfunction, as shown in graph 603 c, filtering module 153 may createtrails of moving objects. Image 602 b depicts an action shot accordingto the computational shutter having a long falloff. These effects arerelated to video synopsis, as they give an immediate impression of themotion of a scene. In both cases, the temporally offset content behavescorrectly with respect to occlusions and perspective changes. As thesemethods require depth for the foreground object, video processingapplication 150 may use depth acquired by a Kinect® sensor.

Inaccurate depth information may make dealing with scene pointocclusions difficult. In some implementations, video processing system100 relies on s_(ref) and scene point redundancy to prevent colorbleeding artifacts. However, using this approach for dynamic foregroundobjects, video processing application 100 can only capture a singleobservation at a given moment of time. For instances when videoprocessing application 150 has neither a reference sample nor asignificant number of samples with which to determine a reasonableprior, video processing application 150 may use the following simpleocclusion heuristic to prevent color bleed-through for scenes withreasonable depth values, e.g., from a Kinect®. Filtering module 153 mayintroduce a sample depth order s_(ord), where s_(ord) is the number ofsamples in S(p) that are closer to p than the current sample s,

s _(ord) =#{qεS|(p−q)²<(p−s)²}  (11)

The weighting function applied by filtering module 153 becomes:

$\begin{matrix}{w_{action} = {{\xi \left( s_{f} \right)}{\exp \left( {- \frac{s_{ord}^{2}}{2\; \sigma_{ord}^{2}}} \right)}}} & (12)\end{matrix}$

In some implementations, filtering module 153 may use σ_(ord)=10 toemphasize the scene points that are the closest to the camera used tocapture video 140, or having a depth closest to display 195.

FIG. 7 shows an example of a scene-space video processing effect,according to one implementation of the present disclosure. Diagram 700shows a virtual aperture effect of video processing application 150.With appropriate weighting functions, video processing system 100 canalso represent complex effects such as virtual apertures, exploiting theexistence of samples in a coherent scene-space. To do this, videoprocessing system 100 models an approximate physical aperture inscene-space and weights the sampled scene points accordingly. Thisallows video processing application 150 to create arbitrary apertureeffects, such as focus pulls and focus manifolds defined in scene-space.

At 701, filtering module 153 applies a weighting function for anapproximate virtual aperture as a double cone with its thinnest point a₀at the focal point z₀. The slope a_(s) of the cone defines the size ofthe aperture as a function of distance from focal point,

a(z)=a ₀ +|z ₀ −z|*a _(s)  (13)

To avoid aliasing artifacts, video processing system 100 uses the samplearea s_(area) introduced previously to weight each sample by the ratioof its size and the aperture size at its scene-space position, becausescene points carry the most information at their observed scale.

With r as the distance of s_(xyz) along the camera viewing ray, and q asdistance from the ray to s, filtering module 153 may use a weightingfunction of the form:

$\begin{matrix}{w_{va} = \left\{ \begin{matrix}\frac{s_{area}}{\pi \; {a(r)}^{2}} & {{{when}\mspace{14mu} q} < {a(r)}} \\0 & {else}\end{matrix} \right.} & (14)\end{matrix}$

Image 702 shows an exemplary image processed using a synthetic aperture.In some implementations, video processing application 150 may not usemultiple viewpoints at the same time instance, but may use scene pointssampled from neighboring frames to compute aperture effects.

FIG. 8 shows an exemplary flowchart illustrating a method of scene-spacevideo processing, according to one implementation of the presentdisclosure. At 811, video processing application 150 samples a pluralityof scene points corresponding to an output pixel of a first frame of aplurality of frames of the video, the plurality of scene points includealternate observations of a same scene point from the plurality ofneighboring frames of the first frame of the video. In someimplementations, neighboring frames may include a frame immediatelypreceding the first frame in video 140, a frame immediately succeedingthe first frame in video 140, a plurality of frames sequentiallypreceding the first frame in video 140, or a plurality of framessequentially succeeding the first frame in video 140, or a combinationof frames preceding and succeeding the first frame in video 140. Scenepoints may refer to points in the first frame that are visible to apixel when the first frame is displayed on a display. In someimplementations, scene points that are visible in a neighboring frame,but occluded in the first frame, may be included in the sample.

At 812, video processing application 150 filters the plurality of scenepoints corresponding to the output pixel to determine a color of theoutput pixel by calculating a weighted combination of the plurality ofscene points corresponding to the output pixel. In some implementations,calculating the weighted combination of the plurality of scene pointscorresponding to the output pixel of the video may determine a videoprocessing effect. At 813, video processing system 100 displays thefirst frame of the video including the output pixel on display 195.

FIG. 9 shows an exemplary flowchart illustrating a method of scene-spacesampling, according to one implementation of the present disclosure. At911, sampling module 151 projects a selected number of frames of thevideo into scene-space, the selected number of frames including a firstframe and neighboring frames of the first frame, the projection intoscene-space creating a point cloud including a plurality of cloudpoints, wherein each cloud point of the plurality of cloud pointscorresponds to a projection of a scene point of a plurality of scenepoints that is visible in the selected number of frames, the point clouddetermined according to a depth map of each scene point of the pluralityof scene points.

At 912, sampling module 151 identifies a frustum defined by the outputpixel of the first frame. At 913, sampling module 151 creates aprojection including a 2D projection of the frustum and a projection ofeach cloud point of the plurality of cloud points in the point cloud. At914, sampling module 151 identifies a plurality of projected cloudpoints in the projection that fall within the 2D projection of thefrustum.

At 915, sampling module 151 maps each projected cloud point of theplurality of projected cloud points that fall within the 2D projectionof the frustum into the output frame of the video. At 916, samplingmodule 151 determines a set of scene points corresponding to the outputpixel of the first frame, the set of scene points corresponding to theplurality of projected cloud points that fall within the 2D projectionof the frustum and that appear in the output pixel that defines thefrustum.

FIG. 10 shows an exemplary flowchart illustrating a method ofscene-space filtering, according to one implementation of the presentdisclosure. At 1011, filtering module 153 identifies a plurality oferroneous observation points in the plurality of scene pointscorresponding to the output pixel, wherein an erroneous observationpoint corresponds to an observation including a scene point occlusion,an observation having incorrect 3D information, and an observation of amoving object. At 1012, filtering module 153 calculates a color of theoutput pixel by applying a weighting function to the plurality of scenepoints, wherein the weighting function emphasizes scene points of theplurality of scene points that are not erroneous observation points.

From the above description it is manifest that various techniques can beused for implementing the concepts described in the present applicationwithout departing from the scope of those concepts. Moreover, while theconcepts have been described with specific reference to certainimplementations, a person of ordinary skill in the art would recognizethat changes can be made in form and detail without departing from thescope of those concepts. As such, the described implementations are tobe considered in all respects as illustrative and not restrictive. Itshould also be understood that the present application is not limited tothe particular implementations described above, but many rearrangements,modifications, and substitutions are possible without departing from thescope of the present disclosure.

What is claimed is:
 1. A video processing system for use with a videohaving a plurality of frames including a first frame and a plurality ofneighboring frames of the first frame, the system including: a memorystoring a video processing application; and a processor configured toexecute the video processing application to: sample a plurality of scenepoints corresponding to an output pixel of the first frame of theplurality of frames of the video, the plurality of scene pointsincluding alternate observations of a same scene point from theplurality of neighboring frames of the first frame of the video; andfilter the plurality of scene points corresponding to the output pixelto determine a color of the output pixel by calculating a weightedcombination of the plurality of scene points corresponding to the outputpixel.
 2. The video processing system of claim 1, wherein to filter theplurality of scene points corresponding to the output pixel of thevideo, the processor is further configured to: identify a plurality oferroneous observation points in the plurality of scene pointscorresponding to the output pixel, wherein an erroneous observationpoint corresponds to an observation including a scene point occlusion,an observation having incorrect three-dimensional (3D) information, andan observation of a moving object; and calculate a color of the outputpixel by applying a weighting function to the plurality of scene points,wherein the weighting function emphasizes scene points of the pluralityof scene points that are not erroneous observation points.
 3. The videoprocessing system of claim 1, wherein calculating the weightedcombination of the plurality of scene points corresponding to the outputpixel of the video determines a video processing effect.
 4. The videoprocessing system of claim 3, wherein the video processing effectcomprises denoising.
 5. The video processing system of claim 3, whereinthe video processing effect comprises deblurring.
 6. The videoprocessing system of claim 3, wherein the video processing effectcomprises super resolution.
 7. The video processing system of claim 3,wherein the video processing effect comprises object semi-transparency.8. The video processing system of claim 3, wherein the video processingeffect comprises video inpainting.
 9. The video processing system ofclaim 3, wherein the video processing effect comprises a computationalscene-space shutter.
 10. The video processing system of claim 1, whereinto sample the plurality of scene points corresponding to the outputpixel of the video, the processor is further configured to: project aselected number of frames of the video into scene-space, the selectednumber of frames including the first frame and neighboring frames of thefirst frame, the projection into scene-space creating a point cloudincluding a plurality of cloud points, wherein each cloud point of theplurality of cloud points corresponds to a projection of a scene pointof a plurality of scene points that is visible in the selected number offrames, the point cloud determined according to a depth map of eachscene point of the plurality of scene points; identify a frustum definedby the output pixel of the first frame; create a projection including atwo-dimensional (2D) projection of the frustum and a projection of eachcloud point of the plurality of cloud points in the point cloud;identify a plurality of projected cloud points in the projection thatfall within the 2D projection of the frustum; map each projected cloudpoint of the plurality of projected cloud points that fall within the 2Dprojection of the frustum into the output frame of the video; anddetermine a set of scene points corresponding to the output pixel of thefirst frame, the set of scene points corresponding to the plurality ofprojected cloud points that fall within the 2D projection of the frustumand that appear in the output pixel that defines the frustum.
 11. Amethod of video processing for use by a video processing systemincluding a memory and a processor, the method comprising: sampling,using the processor, a plurality of scene points corresponding to anoutput pixel of the first frame of the plurality of frames of the video,the plurality of scene points including alternate observations of a samescene point from the plurality of neighboring frames of the first frameof the video; and filtering, using the processor, the plurality of scenepoints corresponding to the output pixel to determine a color of theoutput pixel by calculating a weighted combination of the plurality ofscene points corresponding to the output pixel.
 12. The method of claim11, wherein the filtering the set of scene points corresponding to theoutput pixel of the video further comprises: identifying, using theprocessor, a plurality of erroneous observation points in the pluralityof scene points corresponding to the output pixel, wherein an erroneousobservation point corresponds to an observation including a scene pointocclusion, an observation having incorrect three-dimensional (3D)information, and an observation of a moving object; and calculating,using the processor, a color of the output pixel by applying a weightingfunction to the plurality of scene points, wherein the weightingfunction emphasizes scene points of the plurality of scene points thatare not erroneous observation points.
 13. The method of claim 11,wherein calculating the weighted combination of the plurality of scenepoints corresponding to the output pixel of the video determines a videoprocessing effect.
 14. The method of claim 13, wherein the videoprocessing effect comprises denoising.
 15. The method of claim 13,wherein the video processing effect comprises deblurring.
 16. The methodof claim 13, wherein the video processing effect comprises superresolution.
 17. The method of claim 13, wherein the video processingeffect comprises object semi-transparency.
 18. The method of claim 13,wherein the video processing effect comprises video inpainting.
 19. Themethod of claim 13, wherein the video processing effect comprisescomputational scene-space shutters.
 20. The method of claim 11, whereinsampling the set of scene points corresponding to the output pixel ofthe video further comprises: projecting, using the processor, a selectednumber of frames of the video into scene-space, the selected number offrames including the first frame and neighboring frames of the firstframe, the projection into scene-space creating a point cloud includinga plurality of cloud points, wherein each cloud point of the pluralityof cloud points corresponds to a projection of a scene point of aplurality of scene points that is visible in the selected number offrames, the point cloud determined according to a depth map of eachscene point of the plurality of scene points; identifying, using theprocessor, a frustum defined by the output pixel of the first frame;creating, using the processor, a projection including a two-dimensional(2D) projection of the frustum and a projection of each cloud point ofthe plurality of cloud points in the point cloud; identifying, using theprocessor, a plurality of projected cloud points in the projection thatfall within the 2D projection of the frustum; mapping, using theprocessor, each projected cloud point of the plurality of projectedcloud points that fall within the 2D projection of the frustum into theoutput frame of the video; and determining, using the processor, a setof scene points corresponding to the output pixel of the first frame,the set of scene points corresponding to the plurality of projectedcloud points that fall within the 2D projection of the frustum and thatappear in the output pixel that defines the frustum.